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, measures are the fold change and the f-test statistic (or the t-test p- value). The fold-change is a measure 
CO ■ of differential expression "signal", whereas ^-statistic is a signal standardized by the noise level, i.e., "signal- 
£^ ■ to-noise" ratio. The fold-change is an example of absolute effect size, whereas t-test a relative effect size. 

| Both measures have shortcomings: fold change ignores the noise and does not provide an estimation of chance 
^ , probability; on the other hand, the noise level, thus ^-statistic, may not be estimated reliably when the sample 
size is small. To maximum utilization of statistical information from the data, fold-change and ^-statistic can be 
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ABSTRACT: 

In searching differentially expressed mRN As/genes in a microarray experiment, the two commonly used 



^ ' displayed simultaneously by volcano plots. Volcano plots allow easy comparison between the "double filtering" 
gene selection criterion and "single filtering" or "joint filtering" criteria. Colored volcano plots provide a flexible 
way to incorporate external information such as pathway information of a gene. Stratified volcano plots permit 
examination of hidden patterns such as systematic change of differential expression with the average expression 
level. Overall, volcano plot is a useful visual tool in microarray analysis. 
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1 Introduction 



The microarray technology allows simultaneous measurements of messenger RNA level of 
thousands of genes, and its adoption dramatic change s the way biological and biomedical re 



search is carried out (ISchena et al. 



2005 



Trevino et al.. 



1998 



Young. 



2000 



Butte. 



2002 



ISlonim. 



2002 



Stoughton. 



20071 ). In particular, the more labor-extensive real-time PCR can be re- 



placed by microarray profiling in a prel iminary round, as the general agreement between the 



two m ethods is considered to be good (jEtienne et al.. 



2004 



Dallas et al. 



2005 



Morey et al.. 



20061 ). As an emerging technology, ther e are still many issues to be wor 



consi s tency among different platforms 



2005 



Draghici et al.. 



effect 



2005 



Churchill. 



2006 



2002 



Ein-Dor et al.. 



Kuo et al.. 



Park 



2006 



Baggerly et al.. 



2008 



et al. 



2004 



Larkin et 



Patterson et al. 



Kitchen et al. 



bette r probe design fjYang and Speed. 



200 6j), limit of dynamic range ( 



2006 



2010 



al. 



ted out, such as the 



2005 



Irizarrv et al. 



Chen et al. 



Sharov et al.. 



, leve l of noise (jloannidis 



20071). batch 



fjQuackenbush 



2006; 



Stafford 



2002 



200 



20041 ). etc . However , with 



20021). better data qua lity control (IShi et al. 



2001 



2006) , better data reporting requirement (lloannidis et al. .112009 ) , bett e r normalization scheme 



Vandesompele et al. 



Autio et al. 



2002 



Fujita et al. 



2006 



Steinhoff and Vingron 



20091 ). and better understanding of the study goals, these 



are not insurmountable problems. 

Analyzing large amount of expression data from microarray experiments was thought as a 
major challenge in early days, but this problem was over-estimated. First, the amount the data 
from thousands of genes and a hundred or so sa mples is still much sm aller than, e.g., the data 
generated by whole-genome association studies ( Estrada et al. J 120091 ) or next generation se- 



quencing ( iSchadt et al. .1120 101 ) . and a moderately sized computer might handle the data without 



problems. Second, no brand new statistical learning methods had to be re-invente d and existing 



mach ine techniques could already extract meaningful information from the data (IHastie et al.. 



20011 ). Third, the problem of larger number of false posit ives due to the large number of genes 



being 



2003 



profiled has 



jeen addressed and proper ly handled ( IStorey and Tibshirani 



Reiner et al. 



2003 



2003 



Pawitan et al. 



Storey. 



20051 ). Fourth, in using multiple genes in constructing 



classifier, the well known "large p, small n" problem (large number of variables with small num- 
ber of sample size) can be solved by the variable/subset /feature/model selection techniques 
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e et al.. 


2( 


301 




Li and Yane. 


200^ 


>• 


Ambroise and McLachlan. 


2003 




Li, 


2006; 


Liao and Chin. 


2007 




Zhao et al.. 


201011 



One of the most common applications of microarray is "differential expression" profiling: 
finding mRN As/genes whose expression level to be very different under two conditions, e.g., 
with disease and being healthy. Not only could differentially expressed genes provide in- 



sight to the biological p rocesses invo 
biomarkers for diagnosis ( 



Adib et al.. 



2002 



2004 



Go 



Yeatman. 



Colman et al.. 



2010 



ub et al.. 



yed in disease etiology, but also these can be used as 



1999 



Hedenfalk et al. 



2009 ) or prognosis ( iPomeroy et al.. 



Kim and Paike. 



2001 



Dhanasekaran et al.. 


2001; 


2002 




van de Vijver 


et al.. 



2010J) . The phrase "differential expression" means 



that the averaged expression level of a mRNA/gene in one pheno type-specific group is much 
larger or smaller than that in another group. However, the terms "average" and " larger /smaller" 
are up to various interpretations. 

There are at least two definitions of average: arithmetic mean or geometric mean. For a 
random variable x, arithmetic mean can be represented by E[x], (x), or x, which is equal to 
i ^" =1 (where n is the sample size). Geometric mean is defined by (xix-z ■ ■ ■ i„) 1,,n . For 
fluorescence-light-intensity based microarray data x, it is a common practice to logarithmically 
transform the data x' = log(x), because x' fits better than x to a normal distribution. Then 
arithmetic mean of x' is actually equal to the logarithm of geometric mean of x: E[x'} = 

\ ElLi lo g(^) = log(ziZ 2 • • -x n ) 1/n . 

Deciding "how larger one group's average is compared to the other" is no less trivial. Fold 
change and t-statistic are the two main choices for measure of differential expression. In mi- 
croarray analysis field, these two measures have been in and out of favor at various time. 



Fold change had been common 



into account (IChen et al 



1997 



y used before it was po inted out that it did not take the noise 



Baldi and Long.ll200ll ). t-statistic enjoyed its acceptance un- 



til another round of papers suggesting that genes selected by fold-chan ge are more consistent 



among different m icroarray platforms than those selected by t-statistics ( IShi et al.. 



2005 



Guo et al.. 



20061 : 



20061 ). This result triggered more comments on the rela tionship between repro 



ducibility and accuracy, and between biological and statistical signal (IWitten and Tibshirani 
20071 ). 



If we recognize that both fold change and t-statistic have advantages and shortcomings, 
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then both pieces of information should be used in an analysis. The problem with fold change 
is that the same fold change value will be less impressive if the variance is large. Although 
t-statistic aims at taking the noise level into account, the practical problem is that the variance 
may not be estimated reliably, especially when the sample size is small. Volcano plot, the topic 
of this review, is a visual tool to display both fold change and t-statistic. 

This article is organized as follows: Section 2 establishes a relationship between the fold- 
change and t-statistic; Section 3 introduces volcano plots which simultaneously display both 
fold-change and t-statistic; Section 4 introduces the modified i-statistic which tends to reduce 
the gene-to-gene variation of variance estimation; and Section 5 is the discussion section. One 
microarray dataset is used throughout this paper, which consists of 37 chronic lymphocytic 
leukemia (CLL) samples and 17 control samples. The expression profiling has been carried out 
on Illumina platform with 48804 probesets. 



2 Fold change and t-statistic: signal and signal-to-noise ratio 



Fold change (FC) and t-statistic seem to be two very different quantities: one is intuitive 
and a straightforward measure of differences, another is rooted deeply in field of statistics. 
However, with logarithm transformation there is a relationship between the two. 

The need for logarithmic transformation can be illustrated by FigJTJ Figfj] shows the three 
histograms of fluorescence-light intensity E of a microarray experiment which is indicative of 
the number of mRNA copies hybridized to the probe, thus a measure of mRNA expression level: 
(A) in regular scale, (B) in log-transformed x-axis scale, and (C) of \og(E) itself. Without 
the logarithmic transformation, the distribution of E is very long-tailed, and very skewed 
(asymmetric). With the log transformation (or other similar transformations), even though 
the distribution is still not a perfect normal distribution, it is much more normal-like. 

There are other advantages of a log transformation, e.g. variance is more stablized and 
does not tend to increase with the mean; it is consiste nt with a psyc ho-physics law relating 
human sensation to the logarithm of the stimulus level (jFechner.lll860l ). Note that in a future 
technology where the number of co pies of mRNA can be read directly with out fluorescence 



light intensity as the intermediate ( iGeiss et al 



2008 



Robinson et al. 



2010 ), the role of log 
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Figure 1: Histogram of expression levels of a microarray experiment (this unpublished dataset contains 37 
case samples, 18 control samples, and 48804 probescts in Illumina platform, normalized by "quantile normal- 
ization"): (A) in linear scale. (B) x-axis in a log scale. (C) for log-transformed expression. 

transformation might be reconsidered. However, the decision on whether to log transform or 
not is still based on the histogram of E vs. that of log(-E'). 
The simplest definition of FC is: 



FC 



(EoY 



where the arithmetic average is over the fluorescence-light intensity of samples in group 1 (e.g. 
diseased group) and group (e.g. control group). The logarithm of FC is: 



log(FC) = log 



(Eo) 



log{E 1 ) - \og(E ) « (log E 1 ) - (log S ). 



(2) 



Switching the order of averaging and log-transformation usually does not lead to identical 
values, so the above expression is only approximately true. We can have a second definition of 
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comparison of two versions of FC 
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Figure 2: Comparison of two definitions of fold changes. The x is FC defined in Eq.([T]), in log scale. The y is 
the log(FC') defined in Eq.©. 



FC called FC 



log(FC") = (log£i)-<log^) 



(3) 



Figj2] shows that FC is mostly similar t o FC and we do not disting uish the two definitions. 



The same conclusion is also reached in ( Witten and Tibshirani 



20071 ). 



The t-test is an example of statistical testing whose go al is to compare any observed re- 
sult with chance events. The statistic used in i-test (e.g. ( jsnedecor and Cochran. 1989 )) is 
the difference of arithmetic means in two groups divided ( "standardized" ) by the estimated 
standard deviation of that difference. Standard deviatio n of parameters (e.g., sample mean, 
sample variance) is often called "standard error" (SE) (ISnedecor and Cochran.1 Il989h . One 
requirement for using t-test is that values in two groups roughly follow normal distributions 
(with different means). As discussed above, we need to log transform the fluorescence light 
intensity E to have a normal-like distribution, so t-statistic is: 

+ (log£i>-(log£ > 



S'^(logSi>-(logBo> 



(4) 



Li 



The commonly used estimation of SE, due to Welsh (jWelsh J 119471 ). assumes different vari- 
ances in group 1 and group 0: 

(logl^-OogEo) 



t 



welsh 



s 2 s 2 
ni n 



(5) 



where s 2 and Sq are the estimated variances (of \og(E)) of group 1 and 0, and rii, uq are number 
of samples in the two groups. For readers who are not familiar with statistics, the following 
points can be used to understand the pooled estimation of SE in the denominator of Eqj5) 
(1) SE is the square root of variance and variance is the square of SE; (2) variance of sum or 
difference of two variables, Var\x\ ± a^], is the sum of individual varia nces Var[xi] + Var^]; 
(3) S E of sample means is sample standard deviation divided by y/n (ISnedecor and Cochran. 

laaa). 

The last point can be particularly hard to grasp for biologists, but we are dealing with 
two different types of mean and standard deviations here. For a dataset with n samples, the 
mean, standard deviation, variance are E[x] (or (x), /i), Sd[x] (or yVarpJ, s), \^ar[x] (or s 2 ). 
When the dataset is hypothetically replicated many times, we can talk about mean, standard 
deviation, variance of the sample-mean /x, as each replicate of the dataset may not be exactly 
the same. The dataset-mean is the same as sample- mean, but t he dataset-standard-error is 



sj \fn and dataset-variance is s 2 /n (ISnedecor and Cochran 



19s3 l. 



This section establishes a relationship between \og(FC) and t-statistic: t is \og(FC) (or 
more accurately, log(FC')) standardized by the noise level as measured by the pooled standard 
error. In the field of statistical behavioral science, quantitative psychology, epidemiolo gy, and 



meta -analysis, there is a similar theme of unstandardized vs. standardized effect size (j Cohen 



1988I ). In the field of engineering, quantities like t can be called a signal-to- noise ratio (another 
definition of signal-to-noise ratio is based on power ratio, thus a square operation is applied). In 
the field of applied probability, mean divided by standard deviation is the inverse of coefficient 
of variation. 



3 Volcano plot and its basic use 



If the noise level is known or can be reliably estimated, we of course prefer the measure 
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y: abs(t statistics) 



y: -Iog10{pv) t-test 




1.0 -0.5 0.0 0.5 

log(FC') 



Figure 3: (A) x-axis: ^-statistic, y-axis: — log 10 (p- value) oft-test. (B) Volcano plot using t-statistic in y-axis 
(x-axis is logioFC. (C) Volcano plot using — log 10 (p- value) in y-axis. 

of differential expression that takes the noise level into account, such as t-statistic. In reality, 
not only is smaller sample sizes an issue for variance estimation, but also, if systematic error 
exists, we may not improve the situation by increasing the sample size. For example, it is 
observed that noise level during the hybri dization stage is much higher than that during the 



sample preparation or amplification stage ( ITu et al.. 



20021 ) . If a probe sequence for an mRNA 



is highly represented in the genome, cross-hybridization can be a cause of error and variation. 
However, the probability of this error does not seem to decrease with large sample sizes. 

Facing this reality, we might just display and use both FC and t-statistic, and this is the 
volcano plot. Volcano plot most often refers to t he scatter-plot with — log in (p- value) from 



the t-test as the y-axis and (logio)FC as the x-axis (IJin et al 



2001 



Cui and Churchill 



2003|) 



However, i-statistic and — log 10 (p- value) is (see FigJ^A)) is highly correlated, and whether the 
t (FigEJ^B)) or — log 10 (p- value) (Figj3]^C)) is used in the y-axis, the outcome is very similar. 
The reason why t and p- value from t-test is not one-to-one corresponding (Figj3]JA)) is because 
in determining p-value, Welsh's t distribution has a degree of freedom parameter which also 
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depends on the data (jPan.ll2002l ) 
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Figure 4: Illustration of the double filtering criterion (upper-left and upper-right corners delineated by black 
lines), FC-only singlc-gcne criterion (lower- left and lower-right coners delineated by red lines), and i-test-only 
single-gene criterion ( "football goalpost" in the middle delineated by red lines) . 



The basic use of volcano plots is to check genes that could be sele cted by one differentia l 
expression criterion but not the other. The familiar "double filtering" (IZhang and CaoJl2009l ) 
used by many groups is to set the gene selection criterion by: (i) | log 10 FC\ > log 10 FCo] and 
(ii) t > t . Equivalently, it can be defined as (i) | log 10 FC\ > log 10 FCo] and (ii) p— value 
< p . FC , t , p are preset threshold values for fold change, t-statistic, and t-test p-value. 
The double filtering criterion corresponds to a cutting of two outer rectangular regions in the 
volcano plot (FigHJ). 

The single filtering criterion, after removing the double criterion selected genes, corresponds 
to rectangular regions along the two axes (FigJl]). These are often the genes not selected for 
reasonable arguments: (i) genes with large fold change but nevertheless insignificant test result 
may be caused a few outliers with very large values in one group, (ii) genes with good test 
result (large fs and small t-test p-values) but l ow fold change cou ld be false signal due to low 
variance, which can be caused by batch effect (ILeek et al..ll2010l ). or low expression level (to 
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FC=1 .38, p=7E-1 7 FC=2.66 p=3E-3 




1 1 

Figure 5: (A) a gene with very good t-test result (p-value = 7.7 x 10 -17 ) but only moderate fold-change 
(FC=1.38). (B) a gene with large fold-change (FC=2.66) but moderate t-test result (p-valuc= 3 xlO -3 ). 

be discussed later). 

The goal of using double filtering criterion is to obtain a more robust result. The cost we 
pay is that some real differentially expressed genes might be missed. Volcano plot allows us 
to pick some genes from the single filtering region for further examination. FigJS] shows two 
examples of genes selected by single filtering criterion. 

Figfj^A) is gene selected by t-test p-value only (p = 7.7 x 10~ 17 ) while FC is lower than 2 
(FC=1.379). If the true variance is indeed low and we estimated it correctly from 17 control 
samples, then we trust that this gene is significantly differentially expressed. FigJS^B) is 
selected by FC only (FC=2.66) whereas the p-value is only 3 x 10~ 3 . This gene can still be a 
significantly differential-expression if the large variance in the case group is due to something 
else, e.g. sub-disease types. Statistical analysis alone should not be the only foundation 
for selecting potentially relevant genes, and volcano plot is a way to pick those genes which 
otherwise might be missed. 
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4 Robust variance estimation, regularization, SAM, and joint fil- 
tering 

The essential difference between FC and i-statistic is the consideration of statistical noise 
(variance), but the real challenge is how to estimate the variance from a small number of 
samples. Since variance is calculated around the mean which is also estimated, one idea for 
robust variance estimation is to iteratively remove outliers then calculate mean and variance 



( Dozmorov and Lefkovits 



20091 ). The drawback of this approach is that the number of samples 



used is gradually reduced. 

Another idea for robust variance est imation is motivate d by the typical "large p small n" 
situation for a microarray experiment ( iLi and Yang] 120021 ) . Though the sample size n could 
be small, the number of genes p is nevertheless large, and that large number of genes make 



it po ssible for a reliable estimation of common variance cross all genes ((Pan 
20051 ) . at least for the control group. 



2002 



Cui et al. 



One main worry about variance estimation is that its value can be low due to the low 
expression level. To avoid the estimated variance being too low, we may add a constant 



"penalty" term so to the sample-estimated standard deviation ( iTusher et al. 

(logE 1 )-(logE ) 



20011): 



(6) 



"1 I °0 
711 TlQ 



So 



The penalty is also called "regularization", reflecting the prior belief (in the Bayesian frame- 
work) that variance es t imation across diffe rent genes should exhibit certain smooth behavior 



(IBaldi and Long 



2001 



Hastie et al. 



200ll ). 



A popular software package called SAM (Statistical Analysis of Microarray) (http://www-stat.stanford.edu/ 
is based on Eq.([n]). Another R (R is a free software environment for statistical comput- 



ing: http:/ /www.r-project.org/\ ) implementation of the same idea, siggen, can be found at 



http://www.bioconductor.Org/packages/2.3/bioc/html/siggenes.html In SAM, detailed pro- 



cedures are proposed to determine the Sq value from the data. It is not clear whether this 
procedure is unique or it is just one of many options. In practice, any small value of sq, such 
as the 5% percentile of standard deviations of all genes, can stabilize the var iance estimation 



A Bayesian derivation of the extra term in variance estimation is derived in (IBaldi and Long. 
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2001|) . In this framework, mean, variance of a normal distribution (of log(x) = x') has a prior 
distribution, as well as a posterior distribution after data are observed. For convenience, they 
pick a functional form of prior distribution so that posterior will have the same functional form 
(inverse Gamma distribution for the variance, normal distribution for the mean). It can be 
shown that (the mean of) poster ior variance is a weigh ted sum of prior variance (<Jq) and the 
sample-estimated of variance s 2 (IBaldi and Long] l200ll ) : 



a: 



ws 2 + (1 



'mean. of. posterior ~ 1X10 ~>~ V x W ) (X V) 

where weight w tend to close to 1 for larger sample size (w = (n — 1)/(i>q + n — 2), uq is the 
prior degree of freedom for the inverse Gammar distribution). 

The modified or regularized variance (T 2 neano ^ posterior in Eq.([7j) has the effect of drawing 
gene-specific variance towards the middle, since the change from the estimated variance: 



ws 2 + (1 



w cr. 



o 



-(l-w)(s 2 -al), 



mean.of .posterior ° 

is negative when s 2 > a 2 and positive when s 2 < a 2 . Note that variance is added in Eq.([7j), 
as versus standard deviations being added in the denominator in Eq.flH]). However, the idea of 
adding an extra constant term is the same in Eq.([6]) and Eq.(J7J). 

In fact, there is a second extra ter m in variance estimati on if the sample-estimated mean is 
not a good estimate of the true mean (IBaldi and Long.ll200ll ) . For this r eason, it is reasonable to 
consi der removing outliers to make sure the mean is estimated robustly (IDozmorov and Lefkovits 
20090. 

What is the relationship between robust variance estimation and volcano plots? FC can 
be considered to be the special case when variances of all genes are equal, t-statistic of course 
contains gene-specific variance, and t sam in Eq. ([6]) is somewhere in-between. Rewrite | (log Ep- 



ilog E n )\ as 5, y /s 2 /m 



terms (IZhang and Cao 
5 



sjj/n n as s, the regularized t-statistic in Eq.flE]) can be split into two 



2009|): 
1 



-5 + 



5 



log(FC")| + 



2(s + s ) 



welsh- 



(9) 



s + s 2(s + s ) 2(s + s ) s 2(s + s 
In other words, t sam is a weighted sum of log(FC') and t-statistic. 

A constant t sam value corresponds a line in the volcano plot: w\x\ + vy — to jSam , where 
w = 0.5/ (s + s ), v = 0.5s/(s + s ). Gene selection criterion by SAM (Eq.Q and Eq.(j9])) is 
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w|x|+vy = 5, w=0.5/(s+s0), v=0.5s/(s+s0), 
s0=5% percentile 



s=5% percentile =sO 
s=10% percentile 
s=50% percentile 




-log10(FC) 

Figure 6: All lines correspond to the constant t sam = 5 value (Eq.©) with sq = 0.0266 being the 5% percentile 
standard deviation of all 48804 probes/genes. The six lines are for genes with different standard deviations: 
s = 0.0238,0.0266,0.0283,0.0425,0.0871,0.137 (1%, 5%, 10%, 50%, 75%, 90% percentiles, pink, red, brown, 
purple, green and blue). 

t S am > to,sam- Because each gene has its own standard deviation value s, the threshold can be 
gene-specific. We illustrate this important property of SAM in FigEJ The so is set at 0.0266 
which is the 5% percentile value of s's of all genes in our dataset. For a gene with standard 
deviation of s = 0.0238,0.0266,0.0283,0.0425,0.0871,0.137 (1%, 5%, 10%, 50%, 75%, 90% 
percentiles), the t sam = 5 threshold is represented by lines with various slopes (pink, red, 
brown, purple, green, blue in Figj6]). 

The lines for low-variance genes have steeper slopes, indicating that FC plays a more im- 
portant role in differential expression gene selection. On the other hand, for high-variance 
genes, the threshold lines have flatter slope, indicating that i-test result is more important. As 
discussed in the previous section (and FigEJ), low- variance genes tend to have low FC values 
and high- variance genes tend to have less significant test result, so the consequence of using 
SAM is to counter-balance this trend and to obtain a more robust outcome. We also note that 
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the SAM-based gene selection regions in FigJS] (complementary to triangles) are very different 
from those by dou ble-filtering; criterion (r ectangles in Figj5]). This can also be called a "joint 
filtering" criterion (jZhang and CaoJ 120091 ) . 
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Figure 7: Green, red, green, black dots are the top 100 probes/genes selected by t sam , FC , ^-statistic, and 
p-value of t-test. (A) on volcano plot, x: log(FC'), y: — log 10 (p- value) . (B) x: mean of all samples, y: standard 
deviation of all samples. (C) x: mean of control samples, y: standard deviation of control samples. (D) x: 
mean of diseased samples, y: standard deviation of diseased samples. 

Fig ^ A) compares the top 100 genes selected by SAM (regularized t) (blue) with those 
selected by FC (red), t-test p- value (black), and t-statistic itself (green). Although there are 
certain overlaps among different selection criteria, SAM is able to pick up genes that are not 
selected by either FC or t-test p-value alone. 

To address another question on whether t-test criterion tends to select genes with low 
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variance and low expression level. FigJ7]^B)(C)(D) show the standard deviation (y-axis) vs. 
mean (x-axis) for all samples, control samples only, and diseased (CLL) samples only. Indeed, 
FC-based criterion tend to select genes with high variances, whereas t-test based criterion 
selects relatively low variance genes. SAM achieves a balance between the two criteria, and 
selects genes with intermediate variance values. On the other hand, there is no strong evidence 
that any selection criterion tends to select low expression level genes. 
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Figure 8: Stratified volcano plot: probes/genes on chromosome 6 are marked by red, and those with "cytokine" 
in gene annotation is marked by blue. 



5 Discussion 



The idea and the use of volcano plots can be expanded in several directions. First of all, 
as a 2-dimensional plot, with potentially interesting genes scattered outward, one can examine 
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any external information by introducing colors. If that external piece of information is relevant 
to differential expression, we can easily recognize the fact by a visual impression of the plot. 
This coloring of a volcano plot can be called "stratified volcano plot" . One example is to label 



ar pathway, cellu 



all probes/genes that belong to a particu 
coded in GO (gene ontology) categories (lAshburner et al 



ar co mponent, function, or process 



200(1 ). 



FigJH] illustrates a stratified volcano plot by marking 1614 probes/genes that are located 
on chromosome 6 (red), and 31 probes/genes whose annotation contains the word "cytokine". 
From the stratified volcano plot, we can easily identify interesting candidate genes involving 
cytokines such as CLCF1 (cardiotrophin-like cytokine factor 1, j>-value= 1.4xl0~ 16 , FC'=0.22, 
down- regulated) , SOCS2 (suppressor of cytokine signaling 2, FC'= 0.11, p-value= 3.8xl0 -8 , 
down- regulated) , SOCS3 (suppressor of cytokine signaling 3, FC'=0.28, p-value= 6.2 xl0~ 8 , 
down- regulated) , etc. 

In a work-in-progress, we have deyeloped an R package to color a volcano plots using the 



average expression levels (IHua et al.. 



20111 ). In the program, we introduced an interactive 



feature for users to click a probe/gene on the volcano plot to show the gene names or other 
information. 

Secondly, the idea of simultaneously display of noise-level-standardized signal and unstan- 
dardized one can be useful beyond the microarray field. In genetic association studies, the 
association signal of a single-nucleotide polymorphism (SNP) is usually measured by two quan- 
tities. One is the odds-ratio (OR) of the 2-by-2 count table with disease status as row and 
two alleles as column. OR is not standardized by the noise level or sample size, though the 
95% confidence interval of OR does become narrower for larger sample sizes thus lower level 

n\ — i 

of chance events (lLi.1 120061 ). On the other hand, the chi-square statistic or the p- value of the 
chi-square (x 2 ) test strongly dependent on sample size, thus chance event probability. In fact, 
the chi-square statistics is proportional to the total number of samples for a SNP that contains 
association signal. 

Besides using OR in x-axis (in log scale), another choice is to use the allele frequency 
difference in case and control group. Denote the four counts in the 2-by-2 table in case-control 
association analysis are a,b,c,d, log 10 OR is \og l0 (ad) — log 10 (6c), whereas allele frequency 
difference is a/ (a + b) — c/(c + d) = (ad — bc)(a + &) -1 (c + d)~ l . In other words, the difference 
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between the two choices is whether ad and be are compared in the logarith mic or regular scale . 



It is rare to find a genetic association paper that applies volcano plots ( iSirota et al.. 



2009 



Miclaus et al. J 120101 ) . We believe that many extensions and applications of volcano plots in 



microarray analysis can be equally useful in genetic association analysis. For example, the 
joint filtering criterion, the stratified volcano plot coloring external pieces of information, and 
uncovering of systematic patterns when the colorings are on other statistical information. We 
have found that the location of a SNP on the volcano plot is intrinsically related to its minor 
allele frequency. This will provide further insight on how one should balance the chi-square 
test result and odds-ratio in selecting genetically associated genes. 

In conclusion, volcano plot displays both noise-level-standardized and unstandardized signal 
concerning differential expression of mRNA levels. Joint filtering has a simple geometric inter- 
pretation in volcano plot, and its advantage over double filter criterion of genes can be easily 
understood. As a scattering plot, volcano plot can incorporate other external information, 
such as gene annotation, to aid the hypothesis generating process concerning a disease. 
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